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Abstract 



oo 

^y^ Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, 

^i^ we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these 

^^ queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and 

^ Marx, who gave bounds on the size of a full conjunctive query in terms of the sizes of the individual relations in the 

O body of the query. These bounds, however, are not constructive: they rely on Shearer's entropy inequality which is 

information- theoretic. Thus, the previous results leave open the question of whether there exist algorithms whose 

I running time achieve these optimal bounds. An answer to this question may be interesting to database practice, 

^ as it is known that any algorithm based on the traditional select-project-join style plans typically employed in an 

^nQ RDBMS are asymptotically slower than the optimal for some queries. We construct an algorithm whose running 

1/^ time is worst-case optimal for all natural join queries. Our result may be of independent interest, as our algorithm 

^\ also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using 

^H Shearer's inequality. This bound implies two famous inequalities in geometry: the Loomis- Whitney inequality and 

*s^ the Bollobas-Thomason inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we 

^^ discuss how our algorithm can be used to compute a relaxed notion of joins. 

(N 
•^ 1 Introduction 

^^ Recently, Grohe and Marx | iT] and Atserias, Grohe, and Marx f4l (AGM's results henceforth) derived tight bounds on 

H the number of output tuples of Sifull conjunctive query^in terms of the sizes of the relations mentioned in the query's 

body. As query output size estimation is fundamentally important for efficient query processing, these results have 
generated a great deal of excitement. 

To understand the spirit of AGM's results, consider the following example where we have a schema with three 
attributes, A, B, and C, and three relations, R(A, B), S(B, C) and T(A, C), defined over those attributes. Consider the 
following natural join query: 

q = RxS xT (1) 

Let q(I) denote the set of tuples that is output from applying ^ to a database instance /, that is the set of triples of 
constants (a, b, c) such that R(ab), S(bc), and T(ac) are in /. Our goal is to bound the number of tuples returned by q 
on /, denoted by \q(I)\, in terms of \Rl \S\, and |r|. For simplicity, let us consider the case when \R\ = l^"! = |r| = N. 
A straightforward bound is \q(I)\ < N^. One can obtain a better bound by noticing that any pair- wise join (say Rx S) 



^ A full conjunctive query is a conjunctive query where every variable in the body appears in the head. 



will contain q{I) in it as R and S together contain all attributes (or they "cover" all the attributes). This leads to the 
bound \q{I)\ < N^. AGM showed that one can get a better upper bound of \q{I)\ < N^'^ by generalizing the notion of 
cover to a so-called "fractional cover" (see Section [2]). Moreover, this estimate is tight in the sense that for infinitely 
many values of N, one can find a database instance / that for which \q{I)\ = N^^^. These non-trivial estimates are 
exciting to database researchers as they off'er previously unknown, nontrivial methods to estimate the cardinality of a 
query result - a fundamental problem to support efficient query processing. 

More generally, given an arbitrary natural-join query q and given the sizes of input relations, the AGM method can 
generate an upper bound U such that \q(I)\ < U, where U depends on the "best" fractional cover of the attributes. This 
"best" fractional cover can be computed by a linear program (see Section |2] for more details). Henceforth, we refer to 
this inequality as the AGM 's fractional cover inequality, and the bound U as the AGM' s fractional cover bound. They 
also show that the bound is essentially optimal in the sense that for infinitely many sizes of input relations, there exists 
an instance / such that each relation in / is of the prescribed size and \q{I)\ = U. 

AGM's results leave open whether one can compute the actual set q(I) in time 0(U). In fact, AGM observe this 
issue and presented an algorithm that computes q(I) with a running time of OHqf- - U - N) where N is the cardinality 
of the largest input relation and \q\ denotes the size of the query q. AGM establish that their join-project plan can 
in some cases be super-polynomially better than any join-only plan. However, AGM's join algorithm is not optimal. 
Even on the above example of ([T]), we can construct a family of database instances /i, /2, . . . , /a^, . . . , such that in the 
Mh instance I^ we have \R\ = \S\ = \T\ = N and both AGM's algorithm and any join-only plan take fl(N^)-time even 
though from AGM's bound we know that \q(I)\ < U = N^^^, which is the best worst-case run-time one can hope for. 

The ^/N-g3p on a small example motivates our central question. In what follows, natural join queries are defined 
as the join of a set of relations Ri,. . . ,Rm- 

Optimal Worst-case Join Evaluation Problem (Optimal Join Problem). Given a fixed database schema 

- f - \m 

R = \Ri(Ai)\_ and an m-tuple of integers N = (Ni, . . . , A/^^). Let q be the natural join query joining 
the relations in R and let I(N) be the set of all instances such that \R^.\ = Ni for i = 1, . . . ,m. Define 
U = sup;^^(^) 1^(^)1- Then, the optimal worst-case join evaluation problem is to evaluate q in time 0(U -\- 

Since any algorithm to produce q(I) requires time at least \q(I)\, an algorithm that solves the above problem would 
have an optimal worst-case data-complexity|^ (Note that we are mainly concerned with data complexity and thus the 
0(U) bound above ignores the dependence on \q\. Our results have a small 0(\q\) factor.) 

Implicitly, this problem has been studied for over three decades: a modern RDBMS use decades of highly tuned 
algorithms to efficiently produce query results. Nevertheless, as we described above, such systems are asymptotically 
suboptimal - even in the above simple example of ([T]). Our main result is an algorithm that achieves asymptotically 
optimal worst-case running times for all conjunctive join queries. 

We begin by describing connections between AGM's inequality and a family of inequalities in geometry. In 
particular, we show that the AGM's inequality is equivalent to the discrete version of a geometric inequality proved 
by Bollobas and Thomason ( |7 1, Theorem 2). This equivalence is shown in SectionjS] 

Our ideas for an algorithm solving the optimal join problem begin by examining a special case of the Bollobas- 
Thomason (BT) inequality: the classic Loomis-Whitney (LW) inequality |24|. The LW inequality bounds the measure 
of an ^-dimensional set in terms of the measures of its (^- l)-dimensional projections onto the coordinate hyperplanes. 
The query ([T]) and its bound \q(I)\ < ^f\R\\S~\\T\ is exactly the LW inequality with n = 3 applied to the discrete measure. 
Our algorithmic development begins with a slight generalization of the query ^ in ([T]). We describe an algorithm for 
join queries which have the same format as in the LW inequality setup with n > 3. In particular, we consider "LW 
instances" of the optimal join problem, where the query is to join n relations whose attribute sets are all the distinct 
(n - l)-subsets of a universe of n attributes. Since the LW inequality is tight, and our join algorithm has running time 
that is asymptotically data-optimal for this class of queries (e.g., 0(N^^^) in our motivating example), our algorithm is 
data-complexity optimal in the worst case for LW instances. 

Our algorithm for LW instances exhibits a key twist compared to a conventional join algorithm. The twist is that 
the join algorithm partitions the values of the join key on each side of the join into two sets: those values that are heavy 



^In an RDBMS, one computes information, e.g., indexes, offline that may obviate the need to read the entire input relations to produce the 
output. In a similar spirit, we can extend our results to evaluate any query q in time 0(U), removing the term 2/ Ni by precomputing some indices. 



and those values that are light. Intuitively, a value of a join key is heavy if its fanout is high enough so that joining all 
such join keys could violate the size bound (e.g., N^^^ above). The art is selecting the precise fanout threshold for when 
a join key is heavy. This per-tuple choice of join strategy is not typically done in standard RDBMS join processing. 

Building on our algorithm for LW instances, we next describe our main result: an algorithm to solve the optimal 
join problem for all join queries. In particular, we design an algorithm for evaluating join queries which not only 
proves AGM's fractional cover inequality without using the information-theoretic Shearer's inequality, but also has 
a running time that is linear in the bound (modulo pre-processing time). As AGM's inequality implies the BT and 
LW inequalities, our result is the first algorithmic proof of these geometric inequalities as well. To do this, we must 
carefully select which projections of relations to join and in which order our algorithm joins relations on a "per tuple" 
basis as in the LW-instance case. Our algorithm computes these orderings, and then at each stage it performs an 
algorithm that is similar to the algorithm we used for LW instances. 

Our example also shows that standard join algorithms are suboptimal, the question is, when do classical RDBMS 
algorithms have higher worst-case run-time than our proposed approach? AGM's analysis of their join-project algo- 
rithm leads to a worst case run-time complexity that is a factor of the largest relation worse than the AGM's bound. 
To investigate whether AGM's analysis is tight or not, we ask a sharper variant of this question: Given a query q 
does there exist a family of instances I such that our algorithm runs asymptotically faster than a standard binary-join- 
based plan or AGM's join-project plan? We give a partial answer to this question by describing a sufficient syntactic 
condition for the query q such that for each ^ > 2, we can construct a family of instances where each relation is of 
size A^ such that any binary -join plan as well as AGM's algorithm will need time Q.(N^/k^), while the fractional cover 
bound is (9(A/^^^^/^^"^^) - an asymptotic gap. We then show through a more detailed analysis that our algorithm on 
these instances takes 0(k^N)-timQ. 

We consider several extensions and improvements of our main result. In terms of the dependence on query size, 
our algorithms are also efficient (at most linear in \ql which is better than the quadratic dependence in AGM) for full 
queries, but they are not necessarily optimal. In particular, if each relation in the schema has arity 2, we are able to 
give an algorithm with better query complexity than our general algorithm. This shows that in general our algorithm's 
dependence on the factors of the query is not the best possible. We also consider computing a relaxed notion of joins 
and give worst-case optimal algorithms for this problem as well. 

Outline The remainder of the paper is organized as follows: in the rest of this section, we describe related work. In 
Section[2]we describe our notation and formulate the main problem. Section|3] proves the connection between AGM's 
inequality and BT inequality. In Section|4]we present a data-optimal join algorithm for LW instances, and then extend 
this to arbitrary join queries in Sectionj5| We discuss the limits of performance of prior approaches and our approach 
in more detail in Section [6] In Section |7J we describe several extensions. We conclude in Section [S] 

Related Work 

Grohe and Marx |TT| made the first (implicit) connection between fractional edge cover and the output size of a 
conjunctive query. (Their results were stated for constraint satisfaction problems.) Atserias, Grohe, and Marx ||4| 
extended Grohe and Marx's results in the database setting. 

The first relevant AGM's result is the following inequality. Consider a join query over relations Re, e e E, where 
£" is a collection of subsets of an attribute "universe" V, and relation R^ is on attribute set e. Then, the number of 
output tuples is bounded above by YleeE \Re\^% where x = (Xe)eEE is an arbitrary fractional cover of the hypergraph 
H = (V,E). 

They also showed that this bound is tight. In particular, for infinitely many positive integers N there is a database 
instance with \Re\ = N^'ie e E, and the upper bound gives the actual number of output tuples. When the sizes \Re\ 
were given as inputs to the (output size estimation) problem, obviously the best upper bound is obtained by picking 
the fractional cover x which minimizes the linear objective function ZeG£(log \Re\) ' ^e- In this "size constrained" case, 
however, their lower bound is off' from the upper bound by a factor of 2", where n is the total number of attributes. 
AGM also presented an inapproximability result which justifies this gap. Note, however, that the gap is only dependent 
on the query size and the bound is still asymptotically optimal in the data-complexity sense. 



The second relevant result from AGM is a join-project plan with running time O (l^pA^mtx J^ where A^max is the 
maximum size of input relations and \q\ = \V\- \E\ is the query size. 

The AGM's inequality contains as a special case the discrete versions of two well-known inequalities in geometry: 
the Loomis -Whitney (LW) inequality |24| and its generalization the Bollob as -Thomas on (BT) inequality JTJ. There are 
two typical proofs of the discrete LW and BT inequalities. The first proof is by induction using Holder's inequality |[7|. 
The second proof (see Lyons and Peres |25 1) essentially uses "equivalent" entropy inequalities by Han 1 15 1 and its 
generalization by Shearer JSJ, which was also the route Grohe and Marx |T3| took to prove AGM's bound. All of these 
proofs are non-constructive. 

There are many applications of the discrete LW and BT inequalities. The n = ?> case of the LW inequality was 
used to prove communication lower bounds for matrix multiplication on distributed memory parallel computer s |T9| . 
The inequality was used to prove submultiplicativity inequalities regarding sums of sets of integers |14|. In f231, a 
special case of BT inequality was used to prove a network-coding bound. Recently, some of the authors of this paper 
have used our algorithmic version of the LW inequality to design a new sub-linear time decodable compressed sensing 
matrices 1 10] and efficient pattern matching algorithms p8|. 

Inspired by AGM's results, Gottlob, Lee, and Valiant ||Tl] provided bounds for conjunctive queries with functional 
dependencies. For these bounds, they defined a new notion of "coloring number" which comes from the dual linear 
program of the fractional cover linear program. This allowed them to generalize previous results to all conjunctive 
queries, and to study several problems related to tree- width. 

Join processing algorithms are one of the most studied algorithms in database research. A staggering number of 
variants have been considered, we list a few: Block-Nested loop join. Hash- Join, Grace, Sort-merge (see Grafe |12| 
for a survey). Conceptually, it is interesting that none of the classical algorithms consider performing a per-tuple car- 
dinality estimation as our algorithm does. It is interesting future work to implement our algorithm to better understand 
its performance. 

Related to the problem of estimating the size of an output is cardinality estimation. A large number of structures 
have been proposed for cardinality estimation |[T][9J[T7]|20||2T]|30|, they have all focused on various sub-classes of 
queries and deriving estimates for arbitrary query expressions has involved making statistical assumptions such as 
the independence and containment assumptions which result in large estimation errors p8| . Follow-up work has 
considered sophisticated probability models. Entropy-based models ||26j[32j and graphical models p3| . In contrast, in 
this work we examine the worst case behavior of algorithms in terms of its cardinality estimates. In the special case 
when the join graph is acyclic, there are several known results which achieve (near) optimal run time with respect to 
the output size | |291[35| . 

On a technical level, the work adaptive query processing is related, e.g., Eddies |5| and RIO |6|. The main idea is 
that to compensate for bad statistics, the query plan may adaptively be changed (as it better understands the properties 
of the data). While both our method and the methods proposed here are adaptive in some sense, our focus is diff'erent: 
this body of work focuses on heuristic optimization methods, while our focus is on provable worst-case running time 
bounds. A related idea has been considered in practice: heuristics that split tuples based on their fanout have been 
deployed in modern parallel databases to handle skew (36J. This idea was not used to theoretically improve the running 
time of join algorithms. We are excited by the fact that a key mechanism used by our algorithm has been implemented 
in a modern commercial system. 

2 Notation and Formal Problem Statement 

We assume the existence of a set of attribute names J?l = Ai, . . . , A„ with associated domains Di, . . . , D„ and infinite 

set of relational symbols Ri,R2, A relational schema for the symbol Rt of arity ^ is a tuple At = (A/^, . . . , A/J 

of distinct attributes that defines the attributes of the relation. A relational database schema is a set of relational 
symbols and associated schemas denoted by 7?i(Ai), . . . ,Rfn(Am). A relational instance for R(Ai^, . . . , A/J is a subset 
of D/^ X • • • X D/^. A relational database / is an instance for each relational symbol in schema, denoted by R^.. A natural 
join query (or simply query) q is specified by a finite subset of relational symbols ^ c N, denoted by tx^^^ Ri. Let A(q) 
denote the set of all attributes that appear in some relation in q, that is A(q) = {A | A g A/ for some i e q}. Given a 
tuple t we will write t^ to emphasize that its support is the attribute set A. Further, for any 5 c A we let t^ denote t 
restricted to S . Given a database instance /, the output of the query ^ on / is denoted q(I) and is defined as 



q{I) =^ {t G D^^^^ 1 1^^. G R\ for each / g q 

where D"^^^^ is a shorthand for Xi:AieA{q)^i- 

We also use the notion of a semijoin: Given two relations R{A) and 5" {B) their semijoin 7? ix 5 is defined by 

def 

R]>< S = {tG7?:3uG5' s.t. t^ne = Uinfi} • 
For any relation 7?(A), and any subset 5^ c A of its attributes, let n^ (R) denote the projection of R onto S , i.e. 

7:s(R) = [ts\3tA\s^(tsMs)^R]' 
For any tuple t^ , define the t^ -section of R as 

From Join Queries to Hypergraphs A query q on attributes A(q) can be viewed as a hypergraph // = (V, £^) where 
V = A(q) and there is an edge et = At for each / g q. Let A^^ = \Re\ be the number of tuples in Rg. From now on we will 
use the hypergraph and the original notation for the query interchangeably. 

We use this hypergraph to introduce the fractional edge cover polytope that plays a central role in our technical 
developments. The fractional edge cover polytope defined by H is the set of all points x = (Xe)eeE ^ ^^ such that 

/j^e ^ 1 , f or any v eV 

vee 

Xe > 0, for any e ^ E 

Note that the solution x^ = 1 for ^ g £" is always feasible for hypergraphs representing join queries. A point x in 
the polytope is also called di fractional (edge) cover solution of the hypergraph H. 

Atserias, Grohe, and Marx ||4| establish that, for any point x = {Xe)eeE in the fractional edge cover polytope 



^eeERe\<Y[K'- (2) 



The bound is proved nonconstructively using Shearer's entropy inequality fSl. However, AGM provide an algo- 
rithm based on join-project plans that runs in time 0{\q\^ • N^^' ^') where A^max = maxg^^ A^^. They observed that for a 
fixed hypergraph H and given sizes A^^ the bound ^ can be minimized by solving the linear program which minimizes 
the linear objective SgClogA^g) • Xe over fractional edge cover solutions x. (Since in linear time we can figure out if 
we have an empty relation, and hence an empty output), for the rest of the paper we are always going to assume that 
Ne > 1.) Thus, the formal problem that we consider recast in this language is: 

Definition 2.1 (OJ Problem - Optimal Join Problem). With the notation above, design an algorithm to compute 
\x^eeE Re with running time 

0\f(\V\,\E\)-iY]N^'+YjNe 

V KeeE eeE 

Here f(\Vl \E\) is ideally a polynomial with (small) constant degree, which only depends on the query size. The linear 
term 2ee£ ^e is to read the input. Hence, such an algorithm would be data-optimal in the worst case|^ 

We recast our motivating example from the introduction in our notation. Recall we are given, R(A, B), S (B, C), T(A, C), 
so y = {A,B,C} and three edges corresponding each to R, S , and T, which are E = {{A,B}, {B,C}, {A,C}} respec- 
tively. Thus, |y| = 3 and \E\ = 3. If we are given that A^^ = N, one can check that the optimal solution to the 
LP is Xe = I for ^ G £" which has the objective value |logA/^; in turn, this gives sup^^^^^^ |^(/)| < N^^^ (recall 

I(N) = {I: \Ri\ = Ne for eeE}). 



^Following GLV fll], we assume in this work that given relations R and S one can compute R x S in time 0(\R\ + \S\ + \R m S\). This only 
holds in an amortized sense (using hashing). To acheive true worst case results, one can use sorting operations which results in a log factor increase 
in running time. 



Example 2.2. Given an even integer N, we construct an instance In such that (1) \R^^\ = \S^^\ = \T^^\ = N, (2) 
\R X S\ = \R X T\ = \S X T\ = N^/4 + N/2, and (3) \R x S x T\ = 0. The following instance satisfies all three 
properties: 

R^N =s'^ = T'^ = {(0, j)}Jif U {(j, 0)}Jif . 

For example, 

RxS = {(/, 0, j)}f/f 1 U {(0, /, 0)},.i,...,^/2 

and Rx S X T = d). Thus, any standard join-based algorithm takes time fl(N^). We show later that AGM's algorithm 
takes D(A^^)-time too. Recall that the AGM bound for this instance is 0(N^^^), and our algorithm thus takes time 
0(N^I^). In fact, as shall be shown later, on this particular family of instances both of our algorithms take only 0(N) 
time. 

3 Connections to Geometric Inequalities 

We describe the Bollobas-Thomason (BT) inequality from discrete geometry and prove that BT inequality is equivalent 
to AGM's inequality. We then look at a special case of BT inequality, the Loomis- Whitney (LW) inequality, from 
which our algorithmic development starts in the next section. We state the BT inequality: 

Theorem 3.1 (Discrete Bollobas-Thomason (BT) Inequality). Let S <zir he a finite set of n- dimensional grid points. 
Let J^ be a collection of subsets of {n\ in which every i g {n\ occurs in exactly d members ofj^. Let S p be the set of 
projections Z" -^ iF of points in S onto the coordinates in F. Then, \Sf < YIfeT \^ f\- 

To prove the equivalence between BT inequality and the AGM bound, we first need a simple observation. 

Lemma 3.2. Consider an instance of the OJ problem consisting of a hypergraph H = (V,E), a fractional cover 
X = (Xe)eeE ofH, and relations Re for e e E. Then, in linear time we can transform the instance into another instance 
H' = (y,E'), x' = (x^)ge£/, (R^^)eeE'^ such that the following properties hold: 

(a) x' is a ''tight" fractional edge cover of the hypergraph H\ namely x' > and 

y -^e = 1, for every v e V. 

eeE'wee 

(b) The two problems have the same answer: 

XeeE Re = ^eeE' K' 

(c) AGM's bound on the transformed instance is at least as good as that of the original instance: 

eeE' eeE 

Proof We describe the transformation in steps. At each step properties (b) and (c) are kept as invariants. After all 
steps are done, (a) holds. 

While there still exists some vertex v e V such that YjeeE-.vee Xe > I, i.e. v's constraint is not tight, let / be an 
arbitrary hyperedge f ^ E such that v e f. Partition / into two parts f = fUf-,t, where f consists of all vertices u e f 
such that u's constraint is tight, and f-,t consist of vertices u e f such that w's constraint is not tight. Note that v e f^f. 

Define p = min \xf, min^^ey;^ {Yje-.uee -^e ~ 1}} • This is the amount which, if we were able to reduce Xf by p then we 
will either turn Xf to or make some constraint for u e f^t tight. However, reducing Xf might violate some already 
tight constraint u e f. The trick is to "break" / into two parts. 

We will set E' = EU {f}, create a "new" relation R' = Tif^Rf), and keep all the old relations R'^ = Re for all e e E. 
Set the variables x^ = Xe for Sill e e E - {/} also. The only two variables which have not been set are x' and x' . We 
set them as follows. 



• When Xf < mixiuef^^ {Yje-.uee ^e- 1}, set x'r = and x'r = Xf. 

• When Xf > min^ey;^ {l]e:uee ^e - 1}, Set x'r = Xf - p and x'r = p. 

Either way, it can be readily verified that the new instance is a legitimate OJ instance satisfying properties (b) and 
(c). In the first case, some positive variable in some non-tight constraint has been reduced to 0. In the second case, at 
least one non-tight constraint has become tight. Once we change a variable Xf (essentially "break" it up into x'r and 
x'r) we won't touch it again. Hence, after a linear number of steps in |y|, we will have all tight constraints. n 

With this technical observation, we can now connect the two families of inequalities: 

Proposition 3.3. BT inequality and AGM's fractional cover bound are equivalent. 

Proof. To see that AGM's inequality implies BT inequality, we think of each coordinate as an attribute, and the 
projections 5'/7 as the input relations. Set xp = l/d for each F e T. It follows that x = {xp)p^f is a fractional cover 
for the hypergraph H = ([^], T). AGM's bound then implies that \S \ < YIfeT \^f\^''^- 

Conversely, consider an instance of the OJ problem with hypergraph H = (V,E) and a rational fractional cover 



X = (Xe)eeE of H. First, by Lemma [X2l we can assume that all cover constraints are tight, i.e., 

Xe = I, for any v eV. 



z 

e:vEe 

By standard arguments, it can be shown that all the "new" Xe are rational values (even if the original values were not). 
Second, by writing all variables x^ as d^/d for a positive common denominator d we obtain 

2.de=d, for any v e V. 

e:vee 

Now, create de copies of each relation Rg. Call the new relations R'^. We obtain a new hypergraph H' = (V, E') where 



every attribute v occurs in exactly d hyperedges. This is precisely the Bollobas-Thomason's setting of Theorem [371 
Hence, the size of the join is bounded above by flee^' IKl^^"^ = YIcee IRef'^"^ = YIcee l^^r^ n 

Loomis- Whitney We now consider a special case of BT (or AGM), the discrete version of a classic geometric 
inequality called the Loomis -Whitney inequality |24l. The setting is that forn > 2, V = [n] and E = [\^_X In this case 
Xe = l/(|y| - 1), V^ G £ is a fractional cover solution for (V, £"), and LW showed the following: 



Theorem 3.4 (Discrete Loomis-Whitney (LW) inequality). Let S <z IT be a finite set of n-dimensional grid points. 
For each dimension i g {n\ let S[n\\{i] denote the (n - \)-dimensional projection of S onto the coordinates {n\ \ {/}. 

Then,\Sr'<Y\U\^M\[i^V 

It is clear from our discussion above that LW is a special case of BT (and so AGM), and it is with this special case 
that we begin our algorithmic development in the next section. 



4 Algorithm for Loomis-Whitney instances 



We first consider queries whose forms are slightly more general than that in our motivating example (2.2). This class 



of queries has the same setup as in LW inequality of Theorem 3.4 In this spirit, we define a Loomis-Whitney (LW) 
instance of the OJ problem to be a hypergraph // = {V.E) such that E is the collection of all subsets of V of size | V| - 1 . 
When the LW inequality is applied to this setting, it guarantees that | XeeE Re\ ^ {YleeE^e) \ and the bound is 
tight in the worst case. The main result of this section is the following: 

Theorem 4.1 (Loomis-Whitney instance). Let n > 2 be an integer Consider a Loomis-Whitney instance H = (V = 
[n], E) of the O J problem with input relations Re, where \Re\ = Nefor e e E. Then the join Xe^E Re C(^n be computed 
in time 



O 



\e^E ) eeE 



Before proving this result, we give an example that illustrates the intuition behind our algorithm and solve the 
motivating example from the introduction ([T]). 

Example 4.2. Recall that our input has three relations R(A, B), S(B, C), T(A, C) and an instance / such that \R^\ = 
\S^\ = \T^\ = N. Let J = R X S x T. Our goal is to construct / in time 0(N^^^). For exposition, define a parameter 
T > that we will choose below. We use r to define two sets that eff'ectively partition the tuples in R^ . 

D = {tB e 7Tb(R) : \R^[tB]\ > r} and G = {(tA, tB) e R^ : tB € D] 

Intuitively, D contains the heavy join keys in R. Note that \D\ < N/r. Observe that J Q (D xT)U (G x S) (also note 
that this union is disjoint). Our algorithm will construct D x T (resp. G ix 5") in time 0{N^^^), then it will filter out 
those tuples in both S and R (resp. T) using the hash tables on S and R (resp. T)\ this process produces exactly /. 
Since our running time is linear in the above sets, the key question is how big are these two sets? 

Observe that \DxT\< (N/t)N = N^/r while |G ix ^ | = I^tsenBiG) l^[^fi]P [^sll < ^N. Setting r = ^/N makes both 
terms at most N^^^ establishing the running time of our algorithm. One can check that if the relations are of diff'erent 

cardinalities, then we can still use the same algorithm; moreover, by setting r = a/ m^' ^^ achieve a running time of 

0(^f\RmT\^\R\^\S\^\T\). 



To describe the general algorithm underlying Theorem 4.1 we need to introduce some data structures and notation. 



Data Structures and Notation Let H = (V, E) be an LW instance. Algorithm [T] begins by constructing a labeled, 
binary tree T whose set of leaves is exactly V and each internal node has exactly two children. Any binary tree over 
this leaf set can be used. We denote the left child of any internal node x as lc(x) and its right child as rc(x). Each node 
X G 7~ is labeled by a function label, where label(x) c V are defined inductively as follows: label(x) = V \ {x} for a 
leaf node x g V, and label(x) = label(lc(x)) Pi label(rc(x)) if x is an internal node of the tree. It is immediate that for 
any internal node x we have label(lc(x)) U label(rc(x)) = V and that label(x) = if and only if x is the root of the 
tree. Let / denote the output set of tuples of the join, i.e. / =XeG£ Re- For any node x g 7", let T{x) denote the subtree 
of T rooted at x, and £(T(x)) denote the set of leaves under this subtree. For any three relations R, S , and T, define 
Rtxs T = (RxT)xS. 

Algorithm for LW instances Algorithm [T] works in two stages. Let u be the root of the tree T. First we compute 
a tuple set C(u) containing the output / such that C(u) has a relatively small size (at most the size bound times n). 
Second, we prune those tuples that cannot participate in the join (which takes only linear time in the size of C(u)). The 
interesting part is how we compute C(u). Inductively, we compute a set C(x) that at each stage contains candidate tuples 
and an auxiliary set D(x), which is a superset of the projection 7rLABEL(x)('^ \ C(x)). The set D(x) will intuitively allow 
us to deal with those tuples that would blow up the size of an intermediate relation. The key novelty in Algorithm [T] 
is the construction of the set G that contains all those tuples (join keys) that are in some sense light, i.e., joining over 
them would not exceed the size/time bound P by much. The elements that are not light are postponed to be processed 
later by pushing them to the set D(x). This is in full analogy to the sets G and D defined in Example [421 



Proof of Theorem 4.1 We claim that the following three properties hold for every node x eT: 



(1) 7r,,3BL(x)(/\C(x))cZ)(x); 

(2) \C(x)\ < (\£(T(x))\ - 1) . P; and 

(3) \D(x)\ < min{min/,x(r(x)){A^w\{/}}, ""plg^t)!)^^'^' 

Assuming these three properties hold, let us first prove that that Algorithm [T] correctly computes the join, /. Let u 
denote the root of the tree T. By property (1), 

^LABEL(LC(W))(^ \ C(LC(w))) C Z)(LC(w)) 

^LABEL(RC(M)))('/ \ C(RC(u))) Q D(RC(u)) 



Algorithm 1 Algorithm for Loomis-Whitney Instances 



An LW instance: Re for ^ e dy!^ J and A^^ = \Re\. 

p^n A^i/(«-i) 

u^TOOt(Ty,(C(u),D(u)) 
"Prune" C(u) and return 
LW(jc) : xeT returns (C, D) 



YleeE K'^"" '^ (the size bound from LW inequaUty) 
LW(w) 



if X is a leaf then 

return (0,7?label(x)) 
{Cl.Dl) ^ LW(lc(x)) and (Cr.Dr) ^ LW(rc(x)) 

F <^ 71^ABEL(x)(Dl) n n^ABEL(x)(DR) 

G^iieF: |Z)Jt]| + 1 < IP/\Dr\-]} //F = G = Q i/IDrI = 
if X is the root of T then 

C^(DlX Dr) UCl^Cr 

else 

C ^ (Dl Xg Dr) UCl^Cr 

D^ F\G. 
return (C,D) 



Hence, 



/ \ (C(lc(u)) U C(rc(w))) c D(lc(u)) X D(rc(u)) = D(lc(u)) x D(rc(u)). 



This implies / c C(u). Thus, from C(u) we can compute / by keeping only tuples in C(u) whose projection on any 
attribute set ^ g £" = (J"| j is contained in Re (the "pruning" step). 

We next show that properties 1-3 hold by induction on each step of Algorithm [T] For the base case, consider 
^ e X(T). Recall that in this case C(£) = and D(£) = R[n]-{i}', thus, properties 1-3 hold. 

Now assume that properties 1-3 hold for all children of an internal node v. We first verify properties 2-3 for v. 
From the definition of G, 



|Z)(rc(v)) Xg D(lc(v))\ < 



\D(RC(V))\ 



- 1 • \D(Rc(v))\ < p. 



From the inductive upper bounds on C(lc(v)) and C(rc(v)), property 2 holds at v. By definition of G and a straightfor- 
ward counting argument, note that 



\D(v)\ = \F\G\< \D(Lc(v))\ . 



1 



\D(LC(V))\ . \D(RC(V))\ 



ip/\D(Rc(vm ~ 

From the induction hypotheses on lc(v) and rc(v), we have 

l^(L^(^))l ^ pix(r(.c(v)))i-i 

plX(r(Rc(v)))i-i 

nfcX(r(v))^w-{^} 
p\£(nv))\-i 

\D(v)\<mm(\D(Lc(v))l\D(Rc(v))\X 



which implies that 

Further, it is easy to see that 
which by induction implies that 



\D(RC(V))\ < 



\D(v)\ < 



\D(v)\ < min N\n]-m. 



Property 3 is thus verified. 

Finally, we verify property 1 . By induction, we have 

^LABEL(LC(V))('/ \ C(LC(v))) C Z)(LC(v)) 

^label(rc(v))(^ \ C(rc(v))) C D(rc(v)) 

This along with the fact that label(lc(v)) Pi label(rc(v)) = label(v) implies that 

7T^ABEL(v)(J \ C(lc(v)) U C(rc(v))) C Z)(lc(v))label(v) H Z)(rc(v))label(v) = G \t) D(v). 

Further, every tuple in (/ \ C(lc(v)) U C(rc(v))) whose projection onto label(v) is in G also belongs to D(rc(v)) Xq 
D(lc(v)). This implies that 7rLABEL(v)('^ \ C(v)) = D(v), as desired. 

For the run time complexity of Algorithm [T] we claim that for every node x, we need time 0(n\C(x)\ + n\D(x)\). To 
see this note that for each node x, the lines 4, 5, 7, and 9 of the algorithm can be computed in that much time using 
hashing. Using property (3) above, we have a (loose) upper bound of O (nP + n mmi^£(r(x)) ^[n]\{i}) on the run time for 
node X. Summing the run time over all the nodes in the tree gives the claimed run time. n 

5 An Algorithm for All Join Queries 

This section presents our algorithm for proving the AGM's inequality with running time matching the bound. 

Theorem 5.1. Let H = (V,E) be a hypergraph representing a natural join query. Let n = \V\ and m = l^"!. Let 
X = {Xe)eeE be an arbitrary point in the fractional cover polytope 

/j^e ^ 1 . for any v eV 

e:vee 

Xe > 0, for any e e E 
For each e e E, let Rg be a relation of size Ng = \Re\ (number of tuples in the relation). Then, 

(a) The join XeeE Re has size (number of tuples) bounded by 

\XeeERe\<Y]N'/- 

eeE 

(b) Furthermore, the join XeeE Re can be computed in time 

\ 



o\mnY^Nf +n^y^Ne+ m 



i\g -I- rn " 
eeE 



Remark 5.2. In the running time above, m^n is the query preprocessing time, n^ YueeE ^e is the data preprocessing 
time, and mn Y\eeE ^? is the query evaluation time. If all relations in the database are indexed in advance to satisfy 
three conditions (HTw), w g {1, 2, 3}, below, then we can remove the term n^ YueeE ^e from the running time. Also, the 
fractional cover solution x should probably be the best fractional cover in terms of the linear objective S^Clog A^^) • Xe. 
The data-preprocessing time of 0{n^ Ze ^e) is for a single known query. If we were to index all relations in advance 
without knowing which queries to be evaluated, then the advance-indexing takes 0{n • n\ 2^ A^e)-time. This price is 
paid once, up-front, for an arbitrary number of future queries. 

Before turning to our algorithm and proof of this theorem, we observe that a consequence of this theorem is the 
following algorithmic version of the discrete version of BT inequality. 



10 



Corollary 5.3. Let S a IT be a finite set of n- dimensional grid points. Let T he a collection of subsets of[n] in which 
every i e [n] occurs in exactly d members ofj^. Let S p be the set of projections Z" -^ 1/ of points in S onto the 
coordinates in F. Then, 

Furthermore, given the projections S p we can compute S in time 



/ 


/ \ 


\r\n 


]~[\Sf\ 


^ 


.FeT 



o 

Fer 

Recall that the LW inequality is a special case of the BT inequality. Hence, our algorithm proves the LW inequality 
as well. 

5.1 Main ingredients of the algorithm 

There are three key ingredients in the algorithm (Algorithm [2]) and its analysis: 

1. We first build a "search tree" for each relation R^ which will be used throughout the algorithm. We can also 
build a collection of hash indices which functionally can serve the same purpose. We use the "search tree" data 
structure here to make the analysis clearer. This step is responsible for the (near-) linear term 0(n^ Y^eeE ^e) in 
the running time. The search tree for each relation is built using a particular ordering of attributes in the relation 
called the total order. The total order is constructed from a data structure called a query plan tree which also 
drives the recursion structure of the algorithm. 

2. Suppose we have two relations A and B on the same set of attributes and we'd like to compute AC\ B. If the 
search trees for A and B have already been built, the intersection can be computed in time 0(^min{|A|, \B\}) 
where k is the number of attributes in A, because we can traverse every tuple of the smaller relation and check 
into the search structure for the larger relation. Also note that, for any two non-negative numbers a and b such 
that <3 -F /? > 1, we have min{|A|, \B\} < |A|^|5|^. 

3. The third ingredient is based on 'unrolling' sums using generalized Holder inequality (|4]) in a correct way. We 
cannot explain it in a few lines and thus will resort to an example presented in the next section. The example 
should give the reader the correct intuition into the entire algorithm and its analysis without getting lost in heavy 
notations. 

We make extensive use of the following form of Holder's inequality which was also attributed to Jensen. (See the 
classic book "Inequalities" by Hardy, Littlewood, and Polya 1 16], Theorem 22 on page 29.) 

Lemma 5.4 (Hardy, Littlewood, and Polya |fT6l|). Let m,n be positive integers. Let ji, . . . , j« be non-negative real 
numbers such that 3^1 + • • • + J« > 1. Let atj > be non-negative real numbers, for i e [m] and j e [n]. With the 
convention 0^ = 0, we have: 

m n n f m, XJ; 

En'^;-<n EH ■ <"' 

For each tuple t on attribute set A, we will write t as U to emphasize the support of t: t^ = (ta)a^A' Consider any 
relation R with attribute set S . Let A c 5" and t^ be a fixed tuple. Then, 71a(R^ denote the projection of R down to 
attributes in A. And, define the t a- section of 7? to be 



RHa] := 7Ts-a(R X Ha]) = Hs-a I HaAs-a) e R}. 



In particular, RIU] = R. 
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<i,2,a,4,5,e> 







{3,5}/ \{S} 



^ © © Ci) 



Figure 1 : A query plan tree for the example O J instance 



5.2 A complete worked example for our algorithm and its analysis 

Before presenting the algorithm and analyze it formally, we first work out a small query to explain how the algorithm 
and the analysis works. It should be noted that the following example does not cover all the intricacies of the general 
algorithm, especially in the boundary cases. We aim to convey the intuition first. Also, the way we label nodes in the 
QP-tree in this example is slightly diff'erent from the way nodes are labeled in the general algorithm, in order to avoid 
heavy sub- scripting. 

Consider the following instance to the OJ problem. The hypergraph H has 6 attributes V = {1, . . . , 6}, and 5 
relations Ra,Rb,Rc,Rd^ K defined by the following vertex-edge incident matrix M:. 

\\ a b c d e 



H^^^^^B 


2 


10 110 


k3 


110 1 


4 


110 10 


i 


10 1 



We are given a fractional cover solution x = (x^, x^, Xc, xj, Xg), i.e. Mx > 1. 

Step 0. We first build something called a query plan tree (QP-tree). The tree has nodes labeled by the hyperedge 
a, b, c, d, e, except for the leaf nodes each of which can be labeled by a subset of hyperedges. (Note again that 
the labeling in this example is slightly diff'erent from the labeling done in the general algorithm's description to avoid 
cumbersome notations.) Each node of the query plan tree also has an associated universe which is a subset of attributes. 



The reader is referred to Figure 5.2 for an illustration of the tree building process. In the figure, the universe for each 
node is drawn next to the parent edge to the node. 

The query plan tree is built recursively as follows. We first arbitrarily order the hyperedges. In the example shown 



in Figure [5^ we have built a tree with the order e, d, c, b, a. The root node has universe V. We visit these edges one 
by one in that order. 

If every remaining hyperedges contains the universe V then label the node with all remaining hyperedges and stop. 
In this case the node is a leaf node. Otherwise, consider the next hyperedge in the visiting order above (it is e as we 
are in the beginning). Label the root with e, and create two children of the root e. The left child will have universe 
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V -e, and the right child has e as its universe. Now, we recursively build the left tree starting from the next hyperedge 
(i.e. d) in the ordering, but only restricting to the smaller universe {1, 2, 4}. Similarly, we build the right tree starting 
from the next hyperedge (d) in the ordering, but only restricting to the smaller universe {3,5,6}. 

Let us explain one more level of the tree building process to make things clear. Consider the left tree of the root 
node e. The universe is {1, 2, 4}. The root node will be the next hyperedge d in the ordering. But we really work on 
the restriction of d in the universe {1,2, 4}, which is d' = J Pi {1, 2, 4} = {2, 4}. Then, we create two children. The left 
child has universe {1, 2, 4} - J' = {1}. The right child has universe d^ = {2, 4}. For the left child, the universe has size 
1 and all three remaining hyperedges a, b, and c contain 1, hence we label the left child with abc. 

By visiting all leaf nodes from left to right and print the attributes in their universes, we obtain something called 
the total order of all attributes in V. In the figure, the total order is 1, 4, 2, 5, 3, 6. (In the general case, the total order is 
slightly more complicated than in this example. See Procedure |4]) 

Finally, based on the total order 1, 4, 2, 5, 3, 6 just obtained, we build search trees for all relations respecting this 
ordering. For relation Ra, the top level of the tree is indexed over attribute 1, the next two levels are 4 and 2, and the 
last level is indexed over attribute 5. For 7?^, the order is 1,4, 3, 6. For R^ the order is 1,2, 3. For Rd, the order is 4, 2, 6. 
For Re, the order is 5, 3, 6. It will be clear later that the attribute orders in the search trees have a decisive eff'ect on the 
overall running time. This is also the step that is responsible for the term Oin^ Yue ^e) in the overall running time. 

Step 1. (This step corresponds to the left most node of the query plan tree.) Compute the join 

Ti = 7l{i}(Ra) X 7l{i}(Rt) X 7l{i}(Rc) (5) 

as follows. This is the join over attributes not in d and e. If |7r{i}(7?a)| is the smallest among |7r{i}(7?a)|, |7r{i}(7?/;)|, and 
k{i}(^c)L then for each attribute ti e n{i}(Ra), we search the first levels of the search trees for 7?^ and Re to see if ti is in 
both 7i{i}(Riy) and 7i{i}(Riy). Similarly, if |7r{i}(7?/^)| or |7r{i}(7?c)| is the smallest then for each ti e 7i{i}(Riy) (or in 7i{i}(Rc)) 
we search for attribute ti in the other two search trees. As attribute 1 is in the first level of all three search trees, the 
join ([5]) can be computed in time 



Note that 



0(\T,\) = 0{mm{\7i{i}(Ra)l\7r{im)l\7r{i}(Rc)\]) • 



\T,\ < min{|7ru}(/?J|, |7ru}(/?,)|, |7ru}(/?,)|} < |7ru}(/?jr«|7r{i}(/?,)ri7ru}(/?,)r^ < A^.^^A^.^^A^^^^ 



because x^ + x/, + x^ > 1 . In particular, step 1 can be performed within the run-time budget. 

Step 2. (This step corresponds to the node labeled d on the left branch of query plan tree.) Compute the join 

7^(1,2,4} = TTi 1,2,4} (^«) X 7T{i^4}(Rb) X ^{l,2}(^c) X ^{2,4} (^j) 

This is a join over all attributes not in e. 

Since we have already computed the join Ti over attribute I of Ra, Rb, and R^ the relation T {1^2,4} ^^^ be computed 
by computing for every ti e Ti the ^i -section of T{i^2,4} 

T{i,2,4}[h] = 7r{2,4}(Ra[h]) X n{4}(Rb[ti]) X n{2}(Rc[ti]) x n{2,4}(Rd) 

A[h] B[ti] C[ti] D 

and then ^{1,2,4} is simply the union of all the ^i-sections r{i,2,4}[^i]. The notations A[ti], B[ti], C[ti], and D are defined 
for the sake of brevity. 

Fix ti e Ti, we next describe how T^i 1,2,4} [^1] is computed. If x^ > 1 then we go directly to case 2b below. When 
Xd < I, define 



4 = 

x: = 



1 -Xd 
Xb 

1 -Xd 

Xq 

^-Xd 
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Consider the hypergraph graph H' which is the graph H restricted to the vertices 2, 4 and edges a, b, c. In particular, 
H' has vertex set {2, 4} and edges {2, 4}, {4}, {2}. It is clear that x^,x^, and x'^ form a fractional cover solution of H' 
because x was a fractional cover solution for H. Thus, H' ,^ - (x^, x^, x^), and A\t\\ B\t\\ and C[^i] form an instance 
of the OJ problem. We will recursively solve this instance if a condition is satisfied. 
Case 2a. Suppose 

iA[ri]i<i5[ri]rHC[ri]i<<iz)i 

then we (recursively) compute the join A\t\\ ix B\t\\ tx C\t\\ By induction on the instance H' , this join can be 
computed in time 

(This induction hypothesis corresponds to the node labeled c on the left branch of the query plan tree.) Here, we 
crucially use the fact that the search trees for Ra.Rh, Re have been built so that the subtrees under the branch ti are 
precisely the search trees for relations A[ti], B[ti], C[ti] and thus are readily available to compute this join. Now, to 
get T^ii 2,4}[^i] we simply check whether every tuple in A[ti] x B[ti] x C[ti] belongs to D. 
Case 2b. Suppose either x^ > 1 or 

\D\ < \A[ti]f^\B[ti]f^\C[ti]\< 

then for every tuple (^2, ^4) in D we check whether (^2, ^4) ^ ^[^1], ^4 ^ ^[^1], ^^d t2 e C[ti]. The overall running time 
is 0(\D\). 

Thus, for a fixed value ti, the relation Tj 1,2,4} [^1] can be computed in time 

0(min{|A[ri]|<|5[ri]rHC[ri]rs|Z)|}). 

In fact, it is not hard to see that 

\T{i,2A}ih]\ < min{|A[^i]r«|5Ui]rHCUi]rs|Z)|}. 

This observation will eventually imply the inequality ^ (for this instance), and in the general case leads to the 
constructive proof of the inequality ([2]). 
Next, note that 

min{|A[^i]|<|5Ui]rnCUi]rs|Z)|} < (lAUiH^I^UiH^ICUiir^'"'' \D\'' 

= \A[ti]nB[ti]nc[hr^w. 

lfxd>l then the run-time is also in the order of 0(|A[ri]|^«|5[ri]|^^|C[^i]|^^|Z)|^^). Consequently, the total running time 
for step 2 is in the order of 

2 \A[tnnB[tnnc[tnnD\'' = w ^ mnnBit^ncihr^ 



< \D\ 



Xd 



KheTi 



2 \A[ti]\ Yj i^[^i]| Z i^[^i]| 



KheTi 



\heTi 



< W • \7r{i,2A}(RaT • InnMRbT • k{i,2}(/?c)r^ 

The first inequality follows from generalized Holder inequality because Xa -\- xt -\- Xc >= 1 and Xa,xt,Xc > 0. The 
second inequality says that if we sum over the sizes of the ^1 -sections, we get at most the size of the relation. In 
summary, step 2 is still within the running time budget. 
Step 3. Compute the final join over all attributes 

7^(1,2,3,4,5,6} = Ra X Rb X Re X Rd X Re- 



14 



Since we have already computed the join ^{1,2,4} over attributes 1, 2, 4 of Ra, Rb, Re, and Rd, the relation Tj 1,2,3,4,5,6} can 
be computed by computing for every (^1, ^2, k) ^ ^{1,2,4} the join 

7^(1,2,3,4,5,6} [^1, ^2, ^4] = ^{5}(^^[ ^ 1,^2,^4]) X ^{3,6} (^ ^ [^1, ^4]) X ^{3}(^c[^l, ^2]) X ^{6}(^jU2, ^4]) X 7?^ , 

A B C D E 

and return the union of these joins over all tuples (^i, ^2, k) ^ ^{ 1,2,4}- Again, the notations A, B, C, D, E are introduced 
to for the sake of brevity. Note, however, that they are different from the A, B, C, D from case 2. This step illustrates 
the third ingredient of the algorithm's analysis. 

Fix (ti,t2, k) ^ ^{1,2,4}. If Xg > 1 then we jump to case 3b; otherwise, define 



1 - Xe 

Xb 

1 - Xe 
Xq 

1 - Xe 

Xd 
I - Xe 



Then define a hypergraph H" on the attributes {3, 5, 6} and the restrictions of a, b, c, d on these attributes. Clearly the 
vector x'' is a fractional cover for this instance. 
Case 3a. Suppose x^ > 1 or 

\Af\Bf\Cf\Df <\E\. 

By applying the induction hypothesis on the if instance we can compute the join Ax^xCixDin time 
OHAI^" |5|^^'|C|^^'|Z)|^^ j. (The induction hypothesis corresponds to the node labeled d on right branch of the query 
plan tree.) Again, because the search trees for all relations have been built in such a way that the search trees for A, 
B, C, D are already present on ^1,^2, ^4 -branches of the trees for Ra,Rb,Rc, and Rd, there is no extra time spent on 
indexing for computing this join. Then, for every tuple t(3,5,6} in the join we check (the search tree for) E to see if the 
tuple belongs to E. 
Case 3b. Suppose 

1^1 < lAf^lBfi^lCflDf^. 

Then, for each tuple t{3,5,6} = (^3, ^5, t^) ^ E we check to see whether ^5 g A, (^3, t^) e Bjs e C, and t^ e D. 
Either way, for a fix tuple (^1, ^2, k) ^ ^{1,2,4} the running time is 

O (min {\Af \Bf^ |C|^^' |Z)|^^' , |^|}) . 

Now, we apply the same trick as in case 2: 

mm{\Af^\Bf^\Cf'^\Df^, \E\] < [\Af^\Bf^\Cf'^'\Df^f~''' |£|^^ 

= |A|^«|5|^^|CriZ)|^^|£|^^ 

< |/?J^l,^2,^4]n^/,[^l,^4]n^c[^l,^2]n^^[^2,^4]n^.r^ 

Hence, the total running time for step 3 is in the order of 

Yj i^j^i,^2,^4]n^z.[^i,^4]n^c[^i,^2]ri/?j[^2,^4]n^.r^ 
= i^.r^XXXi^j^i,^2,^4]ri^z.[^i,^4]ri^c[^i,^2]n^^[^2,^4]r^ 

h t2 u 

where the first sum is over ti e 7r{i}(r{ 1,2,4}), the second sum is over ^2 such that (^1, ^2) ^ ^{i,2}(^{i,2,4}), and the third 
sum is over ^4 such that (^1, ^2, k) ^ ^{1,2,4} • We apply Holder inequality several times to "unroll" the sums. Note that 
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we crucially use the fact that x is a fractional cover solution (Mx > 1) to apply Holder's inequality. 
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= \Re 

< \Re 
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h h 

( Y' ( y 

"'Xil^^t^l]!'' Zl^^[^l'^2]l J]|/?a[fl,?2]l J]|/?Jr2]l 

h \ t2 J \ t2 

h 



ya f 



\Xd 



V t2 



\Xb 



yc f 



'^\R,\'^ Yj\Mti]\ Yj^Rc[h]\ Yj^Ra[h]\ 

\ tl J \ tl J \ tl 



\Xa 



5.3 Rigorous description and analysis of the algorithm 

Algorithm [2] computes the join of m given relations. Beside the relations, the input to the algorithm consists of the 
hypergraph H = (V, E) with \V\ = n, \E\ = m, and a point x = (Xe)eEE in the fractional cover polytope 

y ^Xe > 1 , f or any v g V 

Xe > 0, for any e ^ E. 

1. We first build a query plan tree. The query plan tree serves two purposes: (a) it captures the structure of the 
recursions in the algorithm where each node of the tree roughly corresponds to a sub-problem, (b) it gives a total 
order of all the attributes based on which we can pre-build search trees for all the relations in the next step. 

2. From the query plan tree, we construct a total order of all attributes in V. Then, for each relation Re we construct 
a search tree for Re based on the relative order of T^^'s attributes imposed by the total order. 

3. We traverse the query plan tree and solve some of the sub-problems and combine the solutions to form the final 
answer. It is important to note that not all sub-problems corresponding to nodes in the query plan trees will be 
solved. We decide whether to solve a sub-problem based on a "size check." Intuitively, if the sub-problem is 
estimated to have a large output size we will try to not solve it. 

We repeat some of the terminologies already defined so that this section is relatively self-contained. For each tuple 
t on attribute set A, we will write t as Ia to signify the fact that the tuple is on the attribute set A: t^ = {ta)aeA- Consider 
any relation R with attribute set S . Let A <z S and t^ be a fixed tuple. Then R[iA\ denotes the "t^-section" of 7?, which 
is a relation on 5" - A consisting of all tuples ig-A such that (t^, is-A) ^ R- In particular, Rli^] = R. Let 71a(R) denote 
the projection ofR down to attributes in A. 
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Algorithm 2 Computing the join x^^e Re 



Input: Hypergraph H = (V, E), \V\ = n, \E\ = m 
Input: Fractional cover solution x = (Xe)eEE 
Input: Relations Re,e e E 

1: Compute the query plan tree 7", let w be 7~'s root node 

2: Compute a total order of attributes 

3: Compute a collection of hash indices for all relations 

4: return Recursive- Join(w, x, nil) 



Algorithm 3 Constructing the query plan tree T 



1: Fix an arbitrary order ^i, ^2, • • • , ^m of all the hyperedges in E. 

2: 7" <— BUILD-TREE(y, m) 
BUILD-TREE(t/, /:) 

1: if ^/n^ = 0,V/G Mthen 

2: return nil 

3: Create a node u with label(w) <— k and univ(w) = U 

4: if ^ > 1 and 3/ g [k] such that U ^ ^/ then 

5: Lc(w) <— build-tree([/ \ek,k - 1) 

6: RC(w) <— BUILD-TREE(t/ H Ck^k - 1) 

7: return u 



5.3.1 Step (1): Building the query plan tree 

Very roughly, each node x and the sub-tree below it forms the "skeleton" of a sub-problem. There will be many 
sub-problems that correspond to each skeleton. The value label(x) points to an "anchor" relation for the sub-problem 
and UNiv(x) is the set of attributes that the sub-problem is joining on. The anchor relation divides the universe univ(x) 
into two parts to further sub-divide the recursion structure. Fix an arbitrary order 61,62, ... ,efn of all the hyperedges 
in E. For notational convenience, for any k e [m] define E^ = {61,. . . , 6^}. The query plan tree 7" is a binary tree with 
the following associated information: 

• Lab6ls. Each node of T has a "label" label(w) which is an integer k e [m]. 

• Univ6rs6s. Each node uofT has a "universe" univ(w) which is a non-empty subset of attributes: univ(w) c y. 

• Each internal node uofT has a left child lc(u) or a right child rc(u) or both. If a child does not exist then the 
child pointer points to nil. 

Algorithm [3] builds the query plan tree T. Very roughly, each node x and the sub-tree below it forms the "skeleton" 
of a sub-problem. There will be many sub-problems that correspond to each skeleton. The value label(x) points to 
an "anchor" relation for the sub-problem and univ(x) is the set of attributes that the sub-problem is joining on. The 
anchor relation divides the universe univ(x) into two parts to further sub-divide the recursion structure. 

Note that line 5 and 6 will not be executed if U Q 6i, V/ e [k], in which case w is a leaf node. When u is not a leaf 
node, if U Q 6k then u will not have a left child (lc(u) = nil). The running time for this pre-processing step is 0{n?n). 

Figure [2] shows a query plan tree produced by Algorithm [3] on an example query. 

5.3.2 Step (2): Computing a total order of the attributes and building the search trees 

From the query plan tree 7", Procedure [4] constructs a total order of all the attributes in V. We will call this ordering 
th6 total ord6r of V. It is not hard to see that the total order satisfies the following proposition. 

Proposition 5.5. Th6 total ord6r comput6d in Algorithm^satisfi6s th6 following prop6rti6s 

(TOl) For 6V6ry nod6 u in th6 qu6ry plan tr66 7~, all m6mb6rs 6>/uNiv(t/) ar6 cons6cutiv6 in th6 total ord6r 
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Figure 2: (a) A query q and (b) a sample QP tree for q. 



(T02) For every internal node u, if label(u) = k and S is the set of all attributes preceding univ(w) in the total order, 
then S U univ(lc(w)) = S U (U \ ek) is precisely the set of all attributes preceding univ(rc(w)) = e^ H U in the 
total order 



Algorithm 4 Computing a total order of attributes in V 



1: Let T be the query plan tree with root node u, where univ(w) 

2: PRINT-ATTRIBS(w) 
PRINT-ATTRIBS(w) 



V 



if w is a leaf node of T then 

print all attributes in univ(w) in an arbitrary order 
else if Lc(u) = nil then 

PRINT- ATTRIBS(rc(i/)) 

else if Rc(u) = nil then 

PRINT- ATTRIBS(LC(i/)) 

print all attributes in univ(w) \ univ(lc(w)) in an arbitrary order 
else 

PRINT- ATTRIBS(lc(w)) 
PRINT- ATTRIBS(rc(w)) 



For each relation R^^e e E,wq order all attributes in Rg such that the internal order of attributes in R^ is consistent 
with the total order of all attributes computed by Algorithm [4] More concretely, suppose Re has k attributes ordered 
ai,. . .,ak, then ai must come before at+i in the total order, for all 1 < / < ^ - 1. Then, we build a search tree (or any 
indexing data structure) for every relation R^ using the internal order ofR^'s attributes: ai indexes level 1 of the tree, ^2 
indexes the next level, . . ., ak indexes the last level of the tree. The search tree for relation Re is constructed to satisfy 
the following three properties. Let / and j be arbitrary integers such that I < i < j < k. Let t{ai,...,ai} = (^ai, • • • , ta^) be 
an arbitrary tuple on the attributes {^i, . . . , aj. 

(STl) We can decide whether t{ai,...,ai} ^ ^{ai,...,ai}(Re) in O(0-time (by "stepping down" the tree along the ^^j, . . . , ta^ 
path). 

(ST2) We can query the size |7r{«.^^,...,«.}(7?g[t{«^,...,«.}])| in 0(0 time. 

(ST3) We can list all tuples in the set 7i{a-^^^___^aj}(Remai,...,ai}]) in time linear in the output size if the output is not empty. 

The total running time for building all the search trees is 0(n^ Yue^e)- 
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Procedure 5 Recursive- JoiN(t/, y, t^ ) 



1: Let U - UNiv(w), k - label(w) 
2: Ret <^ ^ II Ret is the returned tuple set 
3: if w is a leaf node of T then // note that U Q ei, V/ < k 
4: j ^ argmin-^[^] {iTiuiRe^tsneM 

5: II By convention, ^^^[nil] = R^ and ReiU] = Re 

6: for each tuple tu e ^uiReji^snej]) do 

7: if tu e TTuiReX^sne^i for all / G [k] \ {j} then 

Ret ^ Ret U {(t^ , tu)} 
return Ret 
if Lc(u) = NIL then // u is not a leaf node ofT 

// note that L 9^ and ts could he nil {when S = (D) 
else 

L <— Recursive- Join(lc(w), (yi,. . . ,yk-i), ^s) 
W ^U\ek,W- ^CkHU 
ifW-=(d then 

return L 
for each tuple tsuw = (U , ^w) ^ Ldo 

if yek ^ 1 then 
go to line 27 

Y\\^einW-(Rei[t(SuW)nei])\^^ < l^^W-iRed^SneJ)] 

22: Z <— Recursive- Join rc(u), I j^ I , t^uw 

23: for each tuple (t^ , t^, t^-) g Z do 

24: if t^- G TT^- (7?g, [t^ ne, ] ) then 

25: Ret ^ Ret U {(t^ , t^, t^-)} 

26: else 

27: for each tuple t^- e 7iw-(Reki^snek]) do 

28: if te-nw- ^ ^einw-(Rei[^(Suw)nei]) for all Ci such that / < k and et H W~ ^ then 

29: Ret ^ Ret U {(t^ , t^, t^-)} 

30: return Ret 



9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20: 



21: if 



then 



5.3.3 Step (3): Computing the join 

At the heart of Algorithm |2] is a recursive procedure called Recursive- Join (Procedure [5]) which takes three argu- 
ments: 

• a node u from the query plan tree T whose label is k for some k^ [m] . 

• a fractional cover solution y^^ = (y^j , . . . , ^y^J of the hypergraph (univ(w), E^). Here, we only take the restrictions 
of hyperedges oiE^ onto the universe univ(w). Specifically, 

/.ye ^ 1 , for any / g univ(w) 
ye > 0, for any e e E^ 

• a tuple t^ = (ti)ies where S is the set of all attributes in V which precede univ(w) in the total order. (Due to 



property (TOl) of Proposition 5.5 the set S is well-defined.) If there is no attribute preceding univ(w) then this 
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argument is nil. In particular, the argument is nil if w is a node along the left path of QP-tree T from the root 
down to the left-most leaf. 

Throughout this section, we denote the final output by / which is defined to be / -^e^E Re- The goal of Recursive- 
Join is to compute a superset of the relation {t^} x 7:^^iy{u){J[isY)^ i-^., a superset of the output tuples that start with t^ 
on the attributes S U univ(w). This intermediate output is analogous to the set C in Algorithm [T] for LW instances. A 
second similarity Algorithm [T] is that our algorithm makes a choice per tuple based on the output's estimated size. 

Theorem |5.1| is a special case of the following lemma where we set u to be the root of the QP-tree 7", y = x, 
and 5" = (t^ = nil). Finally, we observe that we need only 0{n^) number of hash indices per input relation, which 
completes the proof. 

Lemma 5.6. Consider a call Recursive- Join(w, y, t^) to Procedure^ Let k = label(w) and U = univ(w). Then, 

(a) The procedure outputs a relation Ret on attributes S U U with at most the following number of tuples 



B(u,yAs) -Y]\7TuneXRe^[tsne^T- 



(For the sake of presentation, we agree on the convention that when U n et = Q we set kf/ne-C^e/Lt^neJ)! = I so 
that the factor does not contribute anything to the product.) 

(b) Furthermore, the procedure runs in time 0(mn • B(u, y, t^ )). 

Proof We prove both (a) and (b) by induction on the height of the subtree of T rooted at u. The proof will also 
explain in "plain" English the algorithm presented in Procedure|5] The procedure tries to compute the join 

Roughly speaking, it is computing the join of all the sections Raits na] inside the universe U. 

Base case. The height of the sub-tree rooted at u is zero, i.e. w is a leaf node. In this case, lines 4-9 of Procedure [5] 
is executed. When w is a leaf node, U Q et, V/ g [k\ and thus U = U nei^'ii e [k]. Since y is a fractional cover solution 
to the hypergraph instance (U, Ek), we know 2f=i yt ^ ^- The join has size at most 

k 

mm{\7ru(ReXtsneM <r\\7runeXRe,[^sneM' = Biu^yAsl 

i-l 

To compute the join, we go over each tuple of the smallest- sized section-projection 7iu(Re [tsne]) and check to see if 
the tuple belongs to all the other section-projections. There are at most k other sections, and due to property (STl) 
each check takes time 0(n). Hence, the total time spent is 0(mn • B(u, y, t^ )). 

Inductive step. Now, consider the case when u is not a leaf node. 

If Lc(w) = NIL which means U Q Ck then there is no attribute mU \ek to join over (line 11). Otherwise, we first 

recursively call the "left sub-problem" (Line 14) and store the result in L. Note that the attribute set of L is 5" VJW = 

S U (U \ek). We need to verify that the arguments we gave to this recursive call are legitimate. It should be obvious 

that k - I = label(lc(w)). Since y = (yi, . . . ,yk) is 3. fractional cover of the (U, Ek) hypergraph, y' = (yi, . . . ,yk-i) is 

a fractional cover of the (U \ ek,Ek-i) hypergraph. And, univ(lc(w)) = U \ Ck. Finally, due to property (T02) S is 

precisely the set of attributes preceding univ(lc(w)) in the total order. From the induction hypothesis, the recursive call 

on line 14 takes time 

f k-i \ 



mn II iTCwneXReiitsneiW 



0(mn-B(Lc(uXy\ts)) = O 

Furthermore, the number of tuples in L is at most B(lc(u), y\ is) = Flfj/ l^wnaiReilisneiW' • 

If W~ = then L is returned and we are done because in this case B(lc(u), y\ts) < B(u, y, t^ ). 

Consider the for loop from line 18 to line 29. We execute the for loop for each tuple isuw = i^s , t^) g L. If L = 

then the output is empty and we are done. If L = {t^ } then this for-loop is executed only once. This is the case if 
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the assignment in line 1 1 was performed, which means U Q ek and thus W = (b. We do not have to analyze this case 
separately as it is subsumed by the general case that L 9^ 0. 

Note that if j^^ > 1 then we go directly to case b below (corresponding to line 27). 

Case a. Consider the case when je^ < 1 and 



k-i 



In this case we first recursively solve the sub-problem 

Z = Recursive- Join 



]~~[|7r,,nw-(^.,[t(5uw)n.j)r''^^ < kw-C^.Jt^n.J)!- 



( I \k-l 

Rc(^), ^' ,isijw 

v-yeJi^i 



We need to make sure that the arguments are legitimate. Note that univ(rc(w)) = W~ , and that je^ < 1. The sub- 
problem is on the hypergraph (W, Ek-i). For any v e W~ = U n ek, because y is a fractional cover of the (U, Ek) 
hypergraph, 

i< Yj ye.=yek+ Yj ^^'- 

ie[k\ : veei ie[k-Y\ : veei 

Hence, 

i< y ^, 

^ 1 - V. 

which confirms that the solution I -r^ I is a fractional cover for the hypergraph (W, Ek-i). Finally, by property 

(T02) the attributes 5" U \y are precisely the attributes preceding W~ in the total order. 

After solving the sub-problem we obtain a tuple set Z over the attributes S VJW VJ W~ = 5" U f/. By the induction 
hypothesis the time it takes to solve the sub-problem is 



O 






and the number of tuples in Z is bounded by 

k-l 



Then, for each tuple in Z we perform the check on line 24. Hence, the overall running time in this case is still 

0\mn Y\\ZI \7^e,f^W-iReXi{SyJW)f^e^)V^A 

Case b. Consider the case when either j^^ > 1 or 

M ^ 

In this case, we execute lines 27 to 29. The number of tuples output is at most ki^CT^^Jt^neJ)! and the running time 

is 0{mn\7iw-{Rek\Xsf^eJ)\)' 

Overall, for both (case a) and (case b) the number of tuples output is bounded by 

I \7iw- (Rek l^s nek])\ Otherwise 
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and the running time is in the order of 0(mnT). We bound T next. When j^^ < 1 we have 

ir\ ^ ] 

T < mm ^ I I \7ie.nw-(Rei[ksuW)nei])\ ''''' , iTiw-iRed^SneM } 



J^l 



(k-l 



I I ke/nw-(^e/[t(5uw)ne/])r 



y-y^k 



\i^l 



\7^W-{ReAiSf^e^T' 



k-l 



= \^Une,(Red^Sne,]W'' ' ff KnW-(ReXksuW)neM'' 



i^l 



When jg^ > 1, it is obvious that the same inequality holds: 



k-l 



< \7rune,(Red^SnejW'' • ]~[Knw-(^.,[t(5uW)n.,])P 



i^l 

Summing overall (t^ , t^) e L, the number of output tuples is bounded by the following sum. Without loss of generality, 
assume W = {1, . . . , J} = [d]. In the following, the first sum is over ti e 7i{i}(L), the second sum is over t2 such that 
(ti,t2) ^ ^{1,2} (^), and so on. To shorten notations a little, define 

Then, the total number of output tuples is bounded by 

k-l 
^ \7:une,(Red^SnejW'' • Y\ Knw-(Re,[t(SuW)neM'' 



= \7lunek(RkW' 
= \^Unek(Rk)\ 



i^l 
k-l 



< \7Tunek(Rk)\ 

< \7runek(Rk)\ 

= \^Unek(Rk)\ 

< ... 

< \7Tunek(Rk)\ 



ye, 



ye, 



h t2 td i-l 

Yj"'Yj n KnW-WitMn.,])^" 2 \^e.nW-{Ri{t[d]neM' 

h td-i i<k,d^ei td i<k,deei 

Z'"Z n Knw-(^/[tMn.,])h- n ^Knw-(^/[tMn.,])l 



ti td-i i<k,diei 



i<k,dEei 



yet 



V td 



Zj"'Zj 11 l^^/^(^~u{J})(^/[t[j-l]neJ)P' I I \^ein(W-u{d})(Ri[kd-l]nei])?''' 



n(W-u{d}) 
ti td-i i<k,d^ei i<k,deei 

k-l 

"'' Z Z • • • Z n Kn(W-uW})W-[t[j-i]n.,])P' 
ti t2 td-i i-l 

k-l 

^'' Z Z ' " Z n KrMW-u{d-l,d})(Riikd-2]ne,]T' 

t\ t2 td-2 i^l 



Y\\^uneXRi)\''^ 



/=1 



6 Limits of Standard Approaches 

For a given join query q, we describe a sufficient syntactic condition for q so that when computed by any join-project 
plan is asymptotically slower than the worst-case bound. Our algorithm runs within this bound, and so for such q there 
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is an asymptotic running-time gap. 

LW Instances Recall that an LW instance of the OJ problem is a join query q represented by the hypergraph (V,E), 
where V = [n], and E = f _ J for some integer n >2. Our main result in this section is the following lemma^ 

Lemma 6.1. Let n > 2 be an arbitrary integer Given any LW-query q represented by a hypergraph {{n\, (i-i))' ^^^ 
any positive integer N > 2, there exist relations Rt, i g {n\, such that \Ri\ = A/^, V/ g {n\, the attribute set for Ri is 
[n] - {/}, and that dixvy join-project plan for q on these relations runs in time Q.(N^/n^). 

Before proving the lemma, we note that both the traditional join- tree algorithm and AGM's algorithm are join- 
project plans, and thus their running times are asymptotically worse than the best AGM bound for this instance which 
is I ix;^^^ Ri\ < YIU IRil^^^""'^^ = A^i+i/("-i\ On the other hand, both Algorithm [l] and Algorithm [5] take 0(N^^^^^''-^^)- 
time as we have analyzed. In fact, for Algorithm [2] we are able to demonstrate a stronger result: its run-time on this 
instance is Oin^N) which is better than what we can analyze for a general instance of this type. In particular, the 
run-time gap between Algorithm [2] and AGM's algorithm is Q.{N) for constant n. 

Proof of Lemma [^J] In the instances below the domain of any attribute will be D = {0, 1, . . . , (A/^ - \)l{n - 1)} For the 
sake of clarify, we ignore the integrality issue. For any / g [n\, let Rt be the set of all tuples in D^"^"^^^ each of which 
has at most one non-zero value. Then, it is not hard to see that \Ri\ = (n- l)[(N - l)/(^ - 1) -h 1] - (^ - 2) = N, for slU 
i e [n]; and, | ix;^^^ Ri\ = ^[(A^ - l)/(n - 1) + 1] - (n- I) = N + (N - l)/(n -1)>N. 

A relation R on attribute set A c [n] is called "simple" if R is the set of all tuples in D"^ each of which has at most 
one non-zero value. Then, we observe the following properties, (a) The input relations Ri are simple, (b) An arbitrary 
projection of a simple relation is simple, (c) Let S and T be any two simple relations on attribute sets As and At, 
respectively. If As is contained in At or vice versa, then S x T is simple. If neither As nor At is contained in the 
other, then |^ ix r| > (1 -h (A^ - l)/(n - 1))^ = ^(N^/n^). 

For an arbitrary join-project plan starting from the simple relations Rt, we eventually must join two relations whose 
attribute sets are not contained in one another, which right then requires fl(N^/n^) run time. n 



Finally, we analyze the run-time of Algorithm [2] directly on this instance without resorting to Lemma 5A_ Holder's 
inequality lost some information about the run-time. The following lemma shows that our algorithm and our bound 
can be better than what we were able to analyze. 

Lemma 6.2. On the collection of instances from the previous lemma, Algorithm^runs in time 0(n^N). 

Proof Without loss of generality, assume the hyperedge order Algorithm |2] considers is [^] -{!},... , [n] - n. In this 
case, the universe of the left-child of the root of the QP-tree is {n}, and the universe of the right-child of the root is 

[^-1]. 

The first thing Algorithm [2| does is that it computes the join L„ =x^~i 7i{n}(Ri), in time 0(nN). Note that L„ = D, 
the domain. Next, Algorithrn]2] goes through each value a e Ln and decide whether to solve a subproblem. First, 
consider the case a > 0. Here Algorithm estimates a bound for the join ix"^J 7T[n-i](Rj[ci]). The estimate is 1 because 
|7r[„_i](7?y[a])| = 1 for all a > 0. Hence, the algorithm will recursively compute this join which takes time 0(n^) and 
filter the result against Rn. Overall, solving the sub problems for a > takes 0(n^N) time. Second, consider the case 
when a = O.ln this case |7r[„_i](7?^[0])| = ^"~^^^~^ . The subproblem's estimated size bound is 



f]|7r[,_i](/?,[0])| 



l-l/(n-l) — 



(^_2)A^-li^«-i>/^"-2> 



>A^ 



(^ - 1) 
if A^ > 4 and n > 4. Hence, in this case Rn will be filtered against the 7[[n-i](Rj[0]), which takes 0(n^N) time. 



^We thank an anonymous PODS' 12 referee for giving us the argument showing that our example works for all join-project plans rather than just 
the AGM algorithm and a join-tree algorithm. 
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Extending beyond LW instances Using the above results, we give a sufficient condition for when there exist a 
family of instances I - /i, . . . , /a^, . . . , such that on instance l^^ every binary join strategy takes time at least ^(N^), 
but our algorithm takes o(N^). Given a hypergraph H = (V,E). We first define some notation. Fix U QV then call an 
attribute v eV\U U -relevant if for all e such that v ^ e then e n U ^ ^; call v U -troublesome if for slU e e E,ifv e e 
then U Q e. Now we can state our result: 

Lemma 6.3. Given a join query H = (V.E) and some U Q V where \U\ > 2, then if there exists F Q E such that 
\F\ = \U\ that satisfies the following three properties: (1) each u e U occurs in exactly \U\ - 1 elements in F, (2) 
each V e V that is U -relevant appears in at least \U\- I edges in F, (3) there are no U -troublesome attributes. Then, 
there is some family of instances I such that (a) computing the join query represented by H with a join tree takes time 
n(N^/\U\^) while (b) the algorithm from Section^takes time 0(N^^^^^^^^-^^). 

Given a (U,F) as in the lemma, the idea is to simply to set all those edges in / g F to be the instances from 



Lemma 6.1 and extend all attributes with a single value, say cq. Since there are no [/-troublesome attributes, to 
construct the result set at least one of the relations in F must be joined. Since any pair F must take time fliN^/lUf-) 
by the above construction, this establishes (a). To establish (b), we need to describe a particular feasible solution to 
the cover LP whose objective value is ^i+i/d^l-i)^ implying that the running time of our proposed algorithm is upper 
bounded by this value. To do this, we first observe that any attribute not in U takes the value only cq. Then, we observe 
that any node v eV that is not (7-relevant is covered by some edge e whose size is exactly 1 (and so we can set x^ = 1). 
Thus, we may assume that all nodes are ^-relevant. Then, observe that all relevant attributes can be set by the cover 
Xe = l/(\U\ - 1) for e e F. This is a feasible solution to the LP and establishes our claim. 

7 Extensions 



In Section 7.1 we describe some results on the combined complexity of our approach. Finally, in Section 7.2 



we 



observe that our algorithm can be used to compute a relaxed notion of join. 

7.1 Combined Complexity 

Given that our algorithms are data-optimal for worst-case inputs it is tempting to wonder if one can obtain an join 
algorithm whose run time is both query and data optimal in the worst-case. We show that in the special case when 
each input relation has arity at most 2 we can attain a data-optimal algorithm that is simpler than Algorithm |2] with an 
asymptotically better query complexity. 

Further, given promising results in the worst case, it is natural wonder if one can obtain a join algorithm whose run 
time is polynomial in both the size of the query as well as the size of the output. More precisely, given a join query q 
and an instance /, can one compute the result of query q on instance / in time poly(|^|, \q(I)\, |/|). Unfortunately, this is 
not possible unless NP = RP. We briefly present a proof of this fact below. 



not 



Each relation has at most 2 attributes As was mentioned in the introduction, our algorithm in Theorem [571 
only has better data complexity than AGM's algorithm (in fact we showed our algorithm has optimal worst-case data 
complexity), it has a better query complexity. In this section, we show that for the special case when the join query 
q is on relations with at most two attributes (i.e. the corresponding hypergraph if is a graph), we can obtain an even 



better query complexity as in Theorem [571] (with the same optimal data complexity). 

Without loss of generality, we can assume that each relation contains exactly 2 attributes because a 1 -attribute 
relation Re needs to have x^ = 1 in the corresponding LP and thus, contributes a separate factor A^^ to the final product. 
Thus, Re can be joined with the rest of the query with any join algorithm (including the naive Cartesian product based 
algorithm). In this case, the hypergraph if is a graph which can be assumed to be simple. 

We first prove an auxiliary lemma for the case when ii is a cycle. We assume that all relations are indexed in 
advanced, which takes 0(2^ ^e) time. In what follows we will not include this preprocessing time in the analysis. The 
following lemma essentially reduces the case when ii is a cycle to the case when ii is a triangle, a Loomis-Whitney 
instance with ^ = 3. 
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Lemma 7.1 (Cycle Lemma). If H is a cycle, then ix^^^ R^ can he computed in time 0(m yTheH^)- 

Proof. First suppose H is an even cycle, consisting of consecutive edges ei = (1, 2), ^2 = (2, 3),- • • ,e2k' = (2/:', 1). 
Without loss of generality, assume 

In this case, we compute the (cross-product) join 

Note that R contains all the attributes. Then, sequentially join R with each of Re2 to Re^,^, . The total running time is 



0{k'Ne,Ne,"Ne,,_^) = 0\mY\NX 

V eeH ) 



Second, suppose H is an odd cycle consisting of consecutive edges ei = (1, 2), ^2 = (2, 3), . . . , e2k'+i = (2/:' + 1, 1). If 
k' = I then by the Loomis-Whitney algorithm for the n = 3 case (Algorithm [T]), we can compute R^^ x Re^ tx Re^ in 
time 0{ ^jNe^Ne^Ne^). Suppose k' > \. Without loss of generality, assume 

A^,^ A^,3 • • • Ne^,^_, < Ne.Ne, • ' • Ne,,^ . 

In particular, Ng^Ne^ • • • Ne^^,_^ < ^jWeeu^e^ which means the following join can be computed in time 0{m ^jY\eeH^e)'' 

X = Re,xR,^X" txT?,^^, ^. 

Note that X spans the attributes in the set [2k']. Let S = {2, 3, . . . , 2/:' - 1}, and Xs denote the projection of X down to 
coordinates in S ; and define 

W = (...(Xs xRe,)xRJ-xRe,,J. 

Since Rg^ x R^^- " x /?e2F_2 spans precisely the attributes in S , the relation W can be computed in time 0(m\Xs |) = 
0(m\X\) = 0(m ^JUeeHNe). Note that 

\W\ < mm{Ne,Ne, • • • Ne,,_, , Ne.Ne, • • • Ne,,J. 

We claim that one of the following inequalities must hold: 



\leeH 
\leeH 



or 



Suppose both of them do not hold, then 

Y]Ne = (Ne,Ne,"'Ne,,J'(Ne,Ne,"'Ne,,J'Ne,,'Ne,,^, 
eeH 

= (\W\-N,^^,)-(\W\-Ne,,J 
> W- 

eeH 

which is a contradiction. Hence, without loss of generality we can assume \W\ • N2k' < ^WeeH^e- Now, compute the 
relation 

Y=WxRe^^,, 
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which spans the attributes S U {2k\ 2k' + 1}. Finally, by thinking of all attributes in the set S U {2k'} as a "bundled 
attribute", we can use the Loomis-Whitney algorithm for ^ = 3 to compute the join 

in time linear in 




With the help of Lemma 7.1 we can now derive a solution for the case when H is an arbitrary graph. Consider any 



basic feasible solution x = {Xe)eeE of the fractional cover polyhedron 

' X, > 1, vG y 



vEe 



> 0, eeE. 



It is known that x is half- integral, i.e. Xe e {0, 1/2, 1} for 2ill e e E (see Schrijver's book JST], Theorem 30.10). 
However, we will also need a graph structure associated with the half-integral solution; hence, we adapt a known 
proof 1 31 1 of the half-integrality property with a slightly more specific analysis. It should be noted, however, that the 
following is already implicit in the existing proof. 

Lemma 7.2. For any basic feasible solution x = {Xe)eeE of the fractional cover polyhedron above, Xe e {0, 1/2, \}for 
all e e E. Furthermore, the collection of edges e for which Xe = I is a union S of stars. And, the collection of edges 
efor which Xe = 1/2 form a set C of vertex-disjoint odd-length cycles that are also vertex disjoint from the union S of 
stars. 

Proof. First, if some Xe = 0, then we remove e from the graph and recurse on G - ^. The new x is still an extreme 
point of the new polyhedron. So we can assume that Xe > for all e e E. 

Second, we can also assume that H is connected. Otherwise, we consider each connected component separately. 

Let ^ = |y| and m = \E\. The polyhedron is defined on m variables and k -\- m inequality constraints. The extreme 
point must be the intersection of exactly m (linearly independent) tight constraints. But the constraints x > are not 
tight as we have assumed Xe > 0, V^. Hence, there must be m vertices v for which the constraints Y,vee -^e ^ 1 are tight. 
In particular, m < k. Since H is connected, it is either a tree, or has exactly one cycle. 

Suppose // is a tree, then it has at least 2 leaves and at most one non-tight constraint (as there must he m = k - I 
tight constraints). Consider the leaf u whose constraint is tight. Let v be w's neighbor. Then Xuv = 1 because u is tight. 
If V is tight then we are done, the graph H is just an edge uv. (If there was another edge e incident to v then Xe = 0.) If 
V is not tight then v is not a leaf. We start from another tight leaf w i^ u of the tree and reason in the same way. Then, 
w has to be connected to v. Overall, the graph is a star. 

Next, consider the case when H is not a tree. All k = m vertices has to be tight in this case. Thus, there cannot 
be a degree- 1 vertex for the same reasoning as above. Thus, ^ is a cycle. If H is an odd cycle then it is easy to show 
that the only solution for which all vertices are tight is the all- 1/2 solution. If H is an even cycle then x cannot be an 
extreme point because it can be written as x = (y -F z)/2 for feasible solutions y and z (just add and subtract e from 
alternate edges to form y and z). n 

Now, let X* be an optimal basic feasible solution to the following linear program. 

min Ze(logA^e)-Xe 

S.t. Ijvee^e > 1, V G V 

Xe >0,eeE. 
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Then YIcee ^e' ^ YleeE ^e' for any feasible fractional cover x. Let S be the set of edges on the stars and C be the 
collection of disjoint cycles as shown in the above lemma, applied to x*. Then, 



n«;==(n«.)nJi> 

eeE \eeS J CeC y eeC 



Consequently, we can apply Lemma [TT] to each cycle CeC and take a cross product of all the resulting relations with 
the relations Re for e e S . We just proved the following theorem. 

Theorem 7.3. When each relation has at most two attributes, we can compute the join \><eeE Re in time 0{m flee^ ^f)- 

Impossibility of Instance Optimality The proof is fairly standard: we use the standard reduction of 3SAT to con- 
junctive queries but with two simple specializations: (i) We reduce from the SUniqueSAT, where the input formula is 
either unsatisfiable or has exactly one satisfying assignment and (ii) ^ is a full join query instead of a general conjunc- 
tive query. It is known that SUniqueSAT cannot be solved in deterministic polynomial time unless NP = RP |34|. 

For the sake of completeness, we sketch the reduction here. Let = Ci A C2 A . . . C^ be a SUniqueSAT CNF 
formula on n variables (2i, . . . ,(2„. (W.l.o.g. assume that a clause does not contain both a variable and its negation.) 
For each clause Cj for j e [m], create a relation Rj on the variables that occur in Cj. The query q is 

Xje[m] Rj- 

Now define the database / as follows: for each j e [m], R^. contains the seven assignments to the variables in Cj that 
makes it true. Note that q(I) contains all the satisfying assignments for (p\ in other words, q(I) has one element if is 
satisfiable otherwise q(I) = 0. In other words, we have \q(I)\ < I, \q\ = 0(m -\- n) and |/| = 0{m). Thus an instance 
optimal algorithm with time complexity poly(|^|, \q{I)\, \I\) for q would be able to determine if is satisfiable or not in 
time poly(n, m), which would imply NP = RP. 

7.2 Relaxed Joins 

We observe that our algorithm can actually evaluate a relaxed notion of join queries. Say we are given a query q 
represented by a hypergraph H = (V, E) where V = [n] and \E\ = m. The m input relations are Re, e e E. We are also 
given a "relaxation" number < r < m. Our goal is to output all tuples that agree with at least m- r input relations. 
In other words, we want to compute UscE,\s\>m-r Xees Re- However, we need to modify the problem to avoid the case 
that the set of attributes of relations indexed by S does not cover all the attributes in the universe V. Towards this end, 
define the set 



C(q,r) = \s c^l 1^1 >m-r and |J^ = yi. 



With the notations established above, we are now ready to define the relaxed join problem. 

Definition 7.4 (Relaxed join problem). Given a query q represented by the hypergraph H = (V = [n],E), and an 
integer < r < m, evaluate 

qr '^= y (XeeS Re) - 
SeC(q,r) 

Before we proceed, we first make the following simple observation: given any two sets S,T e C(q, r) such that 
S Q T, WQ have XeeT Re '^^ees Re- This mcans in the relaxed join problem we only need to consider subsets of 
relations that are not contained in any other subset. In particular, define C(q, r) c C(^, r) to be the largest subset of 
C(^, r) such that for any S i^ T e C(q, r) neither S cT nor T c S . We only need to evaluate qr = UseC(q,r) (^ees Re) - 

Given an 5" g C(q, r), let LPOptC^") denote the optimal size bound given by the AGM's fractional cover inequality 
([2]) on the join query represented by the hypergraph (y,S). In particular, LPOptC^) = Ylees \Re\^' where x* = (x^ees 
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is an optimal solution to the following linear program called LPC^"): 

min Z,G5(log|/?,|)-x, 
subject to l]eeS:iee ^e ^ ^ for any / G V (6) 

Xe >0 for any e e S. 

Upper bounds We start with a straightforward upper bound. 

Proposition 7.5. Let q be a join query on m relations and let < r < m be an integer Then given sizes of the input 
relations, the number of output tuples for query qr is upper bounded by 

Yj l-POpt(^). 

SeC{q,r) 

Further, Algorithm [2] evaluates qr with data complexity linear in the bound above. The next natural question is to 
determine how good the upper bound is. Before we answer the question, we prove a stronger upper bound. 

Given a subset of hyperedges S Q E which "covers" V, i.e. Uees^ = V, let BPSC^") c 5" be the subset of hyperedges 
in S that gets sl positive x* value in an optimal basic feasible solution to the linear program LPC^") defined in ([6]). (If 
there are multiple such solutions, pick any one in a consistent manner.) Call two subsets S,T Q E bfs -equivalent if 
BPSC^") = BFS(r). Finally, define C*(^, r) c C{q, r) as the collection of sets from C(^, r) which contains exactly one 
arbitrary representative from each bfs -equivalence class. 

Theorem 7.6. Let q be a join query represented by H = (V, E), and let < r < m be an integer The number of output 
tuples ofqr is upper bounded by YjSec*(q,r) LPOptC^). Further, the query qr can be evaluated in time 

( \ 

O ^ (m^ • LPOpt(^ ) + poly(^, m)) 

^SeC''(q,r) 

plus the time needed to compute C*(q, r)from q. 



Note that since C*(^, r) c C(^, r), the bound in Theorem 7.6 is no worse than that in Proposition 7.5 We will show 



later that the bound in Theorem 7.6 is indeed tight. 



Proof of Theorem [Z3| We will prove the result by presenting the algorithm to compute qr. A simple yet key idea 
is the following. Let S i^ S' ^ C(q,r) be two diff'erent sets of hyperedges with the following property. Define 

def 

T = BFS(5') = BFS(5") and let x^ = (x*)ieT be the projection of the corresponding optimal basic feasible solution 
to the (V,S) and the (V, 5") problems projected down to T. (The two projections result in the same vector xj,.) The 
outputs of the joins on S and on S' are both subsets of the output of the join on T. We can simply run Algorithm |2] on 
inputs (K T) and x^, then prune the output against relations Re with eeS\T or S'\T. In particular, we only need to 
compute XeeT Re oncc for both S and 5". 

Other than the time to co mpute C*(q, r) in the line 1, line 4 needs poly(^, m) time to solve the LP, li ne 5 needs 

shows that 



5.1 



0{m) time, while by Theorem ] 5. 1[ line 6 will take 0{mn • LPOptC^") + m^n) time. Finally, Theorem [ 

107 1 < LP0pt(5')Jj which shows that the loop in line 7 is repeated LPOptC^") times and lines 8-9 can be implemented 

in 0(m) time and thus, lines 7-9 will take time 0(m • LPOptC^")). 

Finally, we argue the correctness of the algorithm. We first note that by line 8, every tuple t that is output is 
indeed a correct one. Thus, we have to argue that we do not miss any tuple t that needs to be output. For the sake of 
contradiction assume that there exists such a tuple t. Note that by definition of C(q, r), this implies that there exists a 
set 5" G C(q, r) such that for every e e S\te ^ Re- However, note that by definition of C*(q, r), for some execution of 
the loop in line 3, we will consider T such that T = BFS(S'). Further, by the correctness of Algorithm |2] we have that 
ie (pT. This implies (along with the definition of C(q, r)) that t will be retained in line 8, which is a contradiction, n 

It is easy to check that one can compute C* in time m^^^^ (by going through all subsets of E of size at least m- r 
and performing all the required checks). We leave open the question of whether this time bound can be improved. 



^This also proves the claimed bound on the size of qr. 
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Algorithm 6 Computing Relaxed Join qr 



Compute C*{q, r). 

for every S e C*(q, r) do 

Let X* be an optimal BFS for LP(5 ) 
Let r = {^ G ^ I jc^ > 0}. (Note that T = BFS(^).) 
Run Algorithm [2] on [xDeeT to compute 0r =XeeT Re- 
for every tuple t g ^^ do 

if for at least m- r hyperedges e e E,te e Re then 

e ^ G u {t} 

return Q 



Lower bound We now show that the bound in Theorem 7.6 is tight for some query and some database instance /. 



We first define the query q. The hypergraph is H = (V = [n],E) where m = \E\ = n -\- 1. The hyperedges are 
E = {ei, . . . , Cn+i] where et = {/} for / e [n] and en+i = [n]. The database instance / consists of relations Re, e e E, all 
of which are of size N. For each / e [n]. Re- = [N]. And, Re^^^ = U/Ii{^ + iV- 

It is easy to check that for any r > 0, qr(I) is the set Re,^^^ U [NY, i.e. \qr(I)\ = N -\- N''. Next, we claim that for 
this query instance C*(q, r) = {{n -\- 1}, [n]}. Note that BFS({^ + 1}) = {n + 1} and BFS([n]) = [n], which implies that 
LPOpt({^ + 1}) = A^ and LPOpt([^]) = A^". This along with Theorem [T6] implies that |^,(/)| < A^ + A^", which proves 



the tightness of the size bound in Theorem 7.6 as desired. 

Finally, we argue that C*(q, r) = {{n +1}, [n]}. Towards this end, consider any T e C(q, r). Note that if (^ + 1) ^ T, 
we have T - \ri\ and since BFS(r) = T (and we will see soon that for any other T e C(q, r), we have BFS(r) 9^ [n]), 
which implies that [n] e C*(q, r). Now consider the case when {n + \) ^ T . Note that in this case T = {n + \}VJ T' 
for some T' c [n\ such that \T'\ >n-r. Now note that all the relations in T cannot cover the n attributes but Rn+i by 
itself does include all the n attributes. This implies that BFS(r) = {^ + 1} in this case. This proves that {n+ 1} is the 
other element in C*(^, r), as desired. 

Finally, if one wants a more general example where m = n -\- k for k > 1, then one can repeat the above instance 
k times, where each repetition has n/k fresh attributes. In this case, C* will consists of all subsets of relation where 
in each repetition, each such subset has exactly one of {n/k + 1} or [n/k]. In particular, the query output size will be 

7.3 Dealing with full queries and simple functional dependencies 

Full query processing Our goal in this section is to handle a more general class of queries that may contain selec- 
tions and joins to the same table, which we describe now. 

Our notation in this section follows Gottlob et al's |11| notation, and we reproduce it here for the sake of com- 
pleteness. A database instance consists / = (t(, Ri,. . ., Rm) consists of a finite universe of constants 14 and relations 
Ri,. . . ,Rm each over "U. A conjunctive query has the form q = R(xo) <— Rt^iui) A • • • A Ri^^ium), where each Uj is a 
list of (not necessarily distinct) variables of length \uj\. We call each Rt. a subgoal. Each variable that occurs in the 
head Riuo) must also appear in the body. We call a conjunctive query full if each variable that appears in the body also 
appears in the head. The set of all variables in Q is denoted var(2). A single relation may occur several times in the 
body, and so we may have ij = ik for some j i^ k. The answer of a query q over a database instance / is a set of tuples 
of arity |wol, which is denoted q{l), and is defined to contain exactly those tuples ^(xq) where Q : var(2) -^ lAis any 
substitution such that for each j = 1, . . . , m, Q(ui) e Rt.. 

We call a full conjunctive query reduced if no variable is repeated in the same subgoal. We can assume without loss 
of generality that a full conjunctive query is reduced since we can create an equivalent reduced query within the time 
bound. In time 0(\Ri.\) for each j = 1, . . . , m, we create a new relation R'. with arity equal to the number of distinct 
variables. In one scan over Ri we can produce R'. by keeping only those tuples that satisfy constants (selections) in the 
query and any repeated variables. We then construct q' a query over the Rt. in the obvious way. Clearly q{I) = q\I) 
and we can construct both in a single scan over the input. Finally, we make the observation that our method can 
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tolerate multisets as hypergraphs, and so our results extend our method to full conjunctive queries. Summarizing our 
discussion, we have a worst-case optimal instance for full conjunctive queries as well. 

Simple Functional Dependencies Given a join query (V,E), a (simple) functional dependency (FD) is a triple 
(e, u, v) where u,v e V and e e E and is written as e.u -^ e.v. It is a constraint in that the FD (e, u, v) implies that 
for any pair of tuples t, t e R^, if tu = t^ then ty = t[. Fix a set of functional dependencies F, construct a directed 
(multi-)graph G(F) where the nodes are the attributes V and there is an edge (w, v) for each functional dependency. The 
set of all nodes reachable from a node w is a set U of nodes; this relationship is denoted u ^* U. 

Given a set of functional dependencies, we propose an algorithm to process a join query. The first step is to 
compute for each relation Re for e e E,3. new relation R'^, whose attributes are the union of the closure of each element 
ofveE, i.e., e' = {u \ v ^ u for v e e}. Using the closure this can be computed in time |£'||y|. Then, we compute 
the contents of 7?^. Walking the graph induced by the FDs in a breadth first manner, we can expand Re to contain all 
the attributes Re' in time linear in the input size. Finally, we solve the LP from previous section and use our algorithm. 
It is clear that this algorithm is a strict improvement over our previous algorithm that is FD-unaware. It is an open 
question to understand its data optimality. We are, however, able to give an example that suggests this algorithm can 
be substantially better than algorithms that are not FD aware. 

Consider the following family of instances on ^ + 2 attributes A,Bi, . . . ,Bk,C parameterized by N\ 



q = {xl, Ri(A,Bt)) X {xl, Si(Bt,C)) 



Now we construct a family of instances such that \Ri\ = \Si\ = N for i - 1, . . . , ^. Suppose there are functional 
dependencies A ^ Bt. 

Our algorithm will first produce a relation R\A,Bi,.. ., B^) which can then be joined in time N with each relation 
5"/ for / = 1, . . . , ^. When we solve the LP, we get a bound of of \q(J)\ < N^ - and our algorithm runs within this time. 

Now consider the original instance without functional dependencies. Then, the AGM bound is \q(I)\ < N^. More 
interestingly, one can construct a simple instance where half of the join has a huge size, that is | x^^-^ S i(Bi, C)\ = N^. 
Thus, if we choose the wrong join ordering our algorithms running time will blow up. 

8 Conclusion and Future Work 

In this work, we established optimal algorithms for the worst-case behavior of join algorithms. We also demonstrated 
that the join algorithms employed in RDBMSes do not achieve these optimal bounds - and we demonstrated families 
of instances where they were asymptotically worse by factors close to the size of the largest relation. It is interesting 
to ask similar questions for average case complexity. Our work off'ers a fundamentally diff'erent way to approach 
join optimization rather than the traditional binary-join/dynamic-programming-based approach. Thus, our immediate 
future work is to implement these ideas to see how they compare in real RDBMS settings to the algorithms in a modern 
RDBMS. 

Another interesting direction is to extend these results to a larger classes of queries and to database schemata that 
have constraints. We include in the appendix some preliminary results on full conjunctive queries and simple functional 
dependencies (FDs). Not surprisingly, using dependency information one can obtain tighter bounds compared to the 
(FD-unaware) fractional cover technique. We will also investigate whether our algorithm for computing relaxed joins 
can be useful in related context such as those considered in Koudas et al |22|. 

There are potentially interesting connections between our work and several inter-related topics, which are all great 
subjects to further explore. We algorithmically proved AGM's bound which is equivalent to BT inequality, which in 
turn is essentially equivalent to Shearer's entropy inequality. There are known combinatorial interpretations of entropy 
inequalities which Shearer's is a special case of; for example, Alon et al. |2| derived some such connections using a 
notion of "sections" similar to what we used in this paper. An analogous partitioning procedure was used in [27 1 to 
compute joins by relating the number of solutions to submodular functions. Our lead example (the LW inequality with 
n = 3) is equivalent to the problem of enumerating all triangles in a tri-partite graph. It was known that this can be 
doneintimeO(A^^/2)||3). 
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